Revealing the Utilized Rank of Subspaces of Learning in Neural Networks

AuthorsIsha Garg, Christian Koguchi, Eshan Verma, Daniel Ulbricht

This paper has been accepted at the Efficient Systems for Foundation Models workshop at ICML 2024.

In this work, we study how well the learned weights of a neural network utilize the space available to them. This notion is related to capacity, but additionally incorporates the interaction of the network architecture with the dataset. Most learned weights appear to be full rank, and are therefore not amenable to low rank decomposition. This deceptively implies that the weights are utilizing the entire space available to them. We propose a simple data-driven transformation that projects the weights onto the subspace where the data and the weight interact. This preserves the functional mapping of the layer and reveals its low rank structure. In our findings, we conclude that most models utilize a fraction of the available space. For instance, for ViTB-16 and ViTL-16 trained on ImageNet, the mean layer utilization is 35% and 20% respectively. Our transformation results in reducing the parameters to 50% and 25% respectively, while resulting in less than 0.2% accuracy drop after fine-tuning. We also show that self-supervised pre-training drives this utilization up to 70%, justifying its suitability for downstream tasks.

Figure 1: The spectral energy of the singular values of the original weight is well spread and does not show low rank behavior. However, our transformation, which preserves the functional mapping of the layer and therefore does not change accuracy, reveals the low rank structure of the weights of the layer.

Figure 2: The identified utilized rank is low enough that a simple decomposition of each layer into 2 layers with truncated rank results in significant savings in the size of the network and the number of FLOPs, without sacrificing accuracy (when finetuned).

Revealing the Utilized Rank of Subspaces of Learning in Neural Networks

Related readings and updates.

Neural Fisher Kernel: Low-rank Approximation and Knowledge Distillation

Improving Neural Network Subspaces

Discover opportunities in Machine Learning.